Introduction to the course

What is this course about?

This course introduces machine learning to students in business subjects. There will be strong emphasis on predictive analytics, to enable students to frame and solve business problems using data-driven machine learning algorithms. It is self-contained, covers both theory and practice. Fundamental mathematics of machine learning will be reviewed, and students will learn Python for predictive analytics from scratch in this course.

Theory vs practice

Sometimes it is difficult to find a good balance between theory and practice. The ultimate goal of this course is to teach students how to use machine learning in business as well as tell them which models can be used and why they work.

image.png

Expected learning outcomes

By the end of this course students will be able to:

  1. Have a good understanding of the fundamental issues and challenges of machine learning
  2. Appropriately choose and appraise machine learning algorithms for predictive analytics in business
  3. Effectively use Python to process, summarize and visualize business data
  4. Effectively use Python to implement machine learning algorithms to solve business problems

Topics covered

  1. Introduction
  2. Review of Business Mathematics and Statistics
  3. Introduction to Python Programming for Business Analytics
  4. Linear Regression
  5. Logistic Regression
  6. Artificial Neural Networks
  7. Naive Bayes and K-Nearest Neighbors
  8. Tree-Based Models
  9. Support-Vector Machines
  10. Cluster Analysis
  11. Review Session and Q\&A for the Exam

Teaching format

image.png

Assessment

The course will be assessed by an 4 hour written exam on your computer and the exam will be held on Wednesday 27 July 2020 9 am - 1 pm.

exam_pc_3.gif

The following is the most common question that I received these days from students

math2.gif

Here I give a response:
  • The exam questions will be mainly around: (i) implementing machine learning algorithms (by your own codes or the related Python libraries) to solving specific business tasks; and (ii) performing predictive analytics. You will be given datasets and instruction file. You will need to create a Jupyter notebook using Python 3 on your computer to answer the exam questions. The submission is your Jupyter notebook file and you need to ensure your Python codes are clearly presented and can be executed!
  • When we study machine learning algorithms, we will look at their mathematical details. This will ensure you correctly use and implement the right algorithms in solving business problems as well as correctly interpret the results. In many cases, you do not need to prove mathematical theorems or do hand-written calculations. The proofs and theorems in the preliminary assignment and mathematics review (in Session 02) are for the mathematical foundations. As indicated in the course preliminaries, you are expected to have studied some social science mathematics in your previous or current degree programmes. Reviewing those mathematical foundations will help you better understand the machine learning algorithms.
  • Kevin Murphy. Machine Learning: A Probabilistic Perspective, MIT Press 2012.
  • Christopher Bishop. Pattern Recognition and Machine Learning, Springer 2007.
  • More data science and machine learning materials can be seen here: https://github.com/boweichen/dsml

Programming for data analytics & model implementation

Python 3 + Jupyter notebook

Course coordinator information

I am a Lectuer (equivalent to US tenured Assistant Professor) at the Adam Smith Business School of University of Glasgow. I have broad research interest related to the applications of probabilistic modelling and deep learning in business, with special focuses on marketing and finance. You can find more details about my research and teaching at https://boweichen.github.io/

image.png

During the ISUP 2020, if you have any questions related to this course, please ask me in the workshop sessions in Zoom or get in touch with me at bc.acc@cbs.dk

Introduction to machine learning and predictive analytics

What is machine learning?

image.png Fig by Yaser Abu-Mostafa

The term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gaming and artificial intelligence. He defined machine learning as a field of study that gives computers the ability to learn without being explicitly programmed. Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.

Machine learning applications

  • Image recognition
  • Spam classification
  • Web search
  • Voice recognition
  • Autonomous vehicle
  • ...

Machine learning applications in business

  • Consumer segmentation and preference learning
  • Product recommendation (e.g., movies, books, stocks)
  • User targeting in digital advertising
  • Inventory pricing
  • Credit card fraud detection
  • Index tracking/portfolio replication

Three main types of machine learning

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

image.png

Machine learning applications revisit

image.png

Five tribes of machine learning

image.png

image.png

Types of analytics

image.png

O’Reilly Media Data Science Salary Survey 2014

O'Reilly conducted an anonymous survey on the tools successful data analysts and engineers use, and how those tool choices might relate to their salary. Source: https://www.oreilly.com/data/free/2014-datascience-salary-survey.csp

image.png

Tools within each cluster have high correlations, indicating that they are usually used as a combination. Although most data scientist respondents in this survey do not only use tools constraint to one of the above clusters, I do feel these clusters correspond well to the roles each data scientist plays in general:

  • Cluster 1 -- Business Intelligence
  • Cluster 2 -- Hadoop and Data Engineering
  • Cluster 3 -- Machine Learning and Data Analytics
  • Cluster 4 -- Data Visualization

Personally the tools I find most useful and interact with most frequently in daily work are from Cluster 2 + Cluster 3, and occasionally from Cluster 4 (D3, JavaScript). By Wenwen Tao, Quora Data Scientist [https://www.quora.com/What-tools-do-data-scientists-use]

Introduction to Python

What is Python?

Python is a modern, general-purpose, object-oriented, high-level programming language.

It is used for:

  • Web development (server-side)
  • Software development
  • Mathematics/statistical programming
  • System scripting
  • ...

Why use Python?

Pros

  • Open Source – free to install and use
  • Awesome online community
  • Easy to learn

Cons

It is an interpreted language, might take up more CPU time. However, given the savings in programmer time (due to ease of learning), it might still be a good choice.

What is an integrated development environment (IDE)?

IDEs are software platforms that provide programmers and developers a comprehensive set of tools for software development in a single product. IDEs offer a central interface featuring all the tools a developer needs, including the following:

  • Code editor
  • Compiler
  • Debugger

image.png

More information can be seen here: https://www.g2.com/categories/integrated-development-environments-ide

  • PyCharm
  • Visual Studio (for Windows users)
  • Visual Studio Code (across operating systems)
  • Eclipse with PyDev Plugin
  • Spyder
  • Komodo IDE
  • Juypter notebook or Juypter lab
  • ...

Jupyter project

image.png

https://jupyter.org/

Installation of Anaconda

Installing Anaconda on different operating systems is very simple. You can find the detailed installation document from Anaconda webpage:

https://docs.anaconda.com/anaconda/install/

The following are the YouTube videos of installing Anaconda for Windows, Mac OS and Linux Ubuntu, respectively. You can also follow the videos to install Anaconda on your computer.

Installing Anaconda on Windows 10

Installing Anaconda on Mac OS

Installing Anaconda on Linux (Ubuntu)